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ABSTRACT 

This  work  investigates  the  application  of  evolutionaiy  programming  for  automatically  configuring  neural  network 
architectures  for  pattern  classification  ta^  The  evolutionaiy  programming  search  procedure  implements  a  parallel 
nonlinear  regression  technique  and  represents  a  powerful  met^  for  evaluating  a  midtitude  of  neural  network  model 
hypotheses.  The  evolutionary  programming  search  is  augmented  with  the  Solis  &  Wets  random  optimization  method 
thereby  maintaining  the  integrity  of  the  stochastic  seardi  while  taking  into  account  empirical  information  about  the 
response  surface.  A  network  architecture  is  prcqxrsed  whidi  is  motivated  I9  the  structures  generated  in  projection  pursuit 
regression  and  the  cascade-correlation  learning  architecture.  Results  are  given  for  the  3-bit  parity,  normally  distributed 
data,  and  the  T-C  classifier  problems. 


1.  INTRODUCTION 

Dynamic  artificial  neural  networks  (DANNs)  rqriesent  an  alternative  training  methodology  which  not  only 
optimizes  a  weight  set  for  a  specified  network  architecture,  but  also  allows  the  rtetwork  architecture  to  be  modified  during 
the  training  process.  The  necessity  for  this  type  of  training  genoally  results  fiom  the  trial-and-enor  process  undertaken  by 
the  network  designer  on  highly  dimensioned  data  sets  ^lich  lade  obvious  feature  vectors.  The  usual  objective  of  DANN  ' 
training  methodologies  is  to  minimize  an  energy  function  that  adequately  describes  the  network  topology  as  well  as  the 
mean  sum-squared  pattern  error.  This  type  of  training  approach  re^ts  in  parsimonious  network  structures  with  either  a 
reduced  number  of  redundant  Ityperplanes,  minimized  coimectivity,  or  both.  The  benefits  of  the  resulting  networks  include 
reduced  throughput  times  for  leal-tiirw  signal  processing  applications  and  potentially  better  generalization  capdnlities  by 
the  avoidance  of  overfitting  the  training  data. 

The  DANN  training  philosophy  may  be  roughly  brdren  down  into  two  classes:  (1)  those  that  modify  their 
connectivity  and  (2)  those  that  modify  the  number  of  hid^  units.  Representative  examples  of  the  first  class  of  DANN 
training  algorithms  include  weight  decay^  and  weight  elimination^.  Rqrresentative  examples  of  the  latter  class  of  DANN 
training  algorithms  include  the  dynamic  node  creation  (DNQ  algorithn^,  an  upgrade  to  the  DNC  algorithm  Ity  Hirose  et 
al.*  which  also  deletes  units  and  the  cascade-correlation  (CQ  learning  ardiitecture^.  The  CC  training  ^roach  is  unique 
in  that  it  fixes  the  input-to-hidden  unit  weights  and  only  ntodifies  the  weights  of  the  output  units.  The  idea  of  adding 
additional  units  to  acMeve  better  function  approxinuitions  is  similar  to  projection  pursuit  regression^  techniques. 

The  use  of  evolutionary  search  methods  is  becoming  prevalent  as  a  n^work  construction  technique.  Network 
architectures  have  been  "evolved"  during  the  training  process  using  genetic  algorithms'^,  evolutionary  strategies*,  and 
evolutionary  progranuning’.  Both  the  genetic  algorithm  and  evolutionary  strategy  approaches  incorporated  the 
badqrropagation  algorithm  for  weight  adjustment  while  the  evolutionary  programming  approach  implement  a  hybrid 
^ochastic  search  method. 

The  goal  of  this  investigation  is  to  determine  the  feasibility  of  an  evolutionary  search  method,  evolutionary 
progranuning,  for  the  automatic  design  of  a  general  feedforward  neural  network  architecture.  These  architectures  have 
three  types  of  units  (input,  hidden,  and  output),  as  opposed  to  three  types  of  layers.  This  distinction  is  made  since  each 
additional  hidden  unit  may  be  cormected  to  all  of  the  previous  units  (both  input  and  hidden)  in  the  network.  Citing  the 
benefits  of  reduced  connectivity  given  above,  the  resulting  structures  will  not  necessarily  be  firlly  cormected  in  a  CC  sense. 
The  next  section  briefly  discusses  the  projection  pursuit  and  CC  classifier  construction  techniques.  Note  that  other 
constructive  rqrproaches  such  as  adaptive  kernel  estimators**’  are  equally  applicable  for  classifier  determination.  Finally,  a 
hybrid  learning  architecture  is  proposed  which,  using  evolutionary  progranuning  training  methods,  incorporates  structural 
aspects  of  both  the  CC  and  projection  pursuit  architectures. 


2.  CLASSIFIER  CONSTRUCTION  MODELS 


2.1  A  connectioni^  representatioD  of  projection  pursuit 

Projection  pursuit  regression  (PPR)  structures  are  nonpararnetric  models  resulting  from  a  successive  refinement 
training  process.  Pattern  classification  using  this  regression  technique  has  been  demonstrated  by  Flidc  et  a/.>‘.  n*R 
generates  an  approximation  to  as  a  sum  of  empirically  d^ermin^  smooth  functions  g  of  linear  combinations  of  the 
input  vector  as  described  by 


/,(»)  = 


Training  progresses  by  using  a  successive  refinement  concqit  to  incrementally  determine  a  ridge  function  gy(ajx)  with 
correspoitding  unit  vector  which  minimizes 


where  rj  represents  the  current  set  of  residuals.  Flidc  et  alM  point  out  that  one  problem  with  PPR  is  the  inability  to  backfit 
the  data  "readjusting  the  ridge  functions  used  in  earlio'  projections."  A  oonrtectionist  view  can  be  taken  of  fiiis 
regression  technique  with  the  resulting  three-li^r  architecture  illustrated  in  Fig.  1. 


Fig.  1.  A  connectionist  representation  of  the  projection  pursuit  regression  function. 

2.2  The  cascade-correlation  algorithm 

The  successive  refinement  technique  employed  in  PPR  is  similar  to  the  construction  method  used  in  the  CC 
learning  architecture.  The  cascadeKxrrrelation  learning  architecture  adds  hidden  units  as  necessary  in  an  effort  to  minimize 
the  residual  errors.  Significant  differences  between  the  CC  and  PPR  algorithms  are  that  the  cascaded  nodes  can  result  in 
more  complex  nonlinear  mappinp  whereas  the  PPR  method  selects  an  appropriate  nonlinear  manring  (and  unit  vector)  to 
minimize  the  residuals  at  each  generation.  In  the  CC  learning  architecture  a  candidate  pool  of  hidden  units  ate  individually 
trained  to  maximize  the  covariance  between  each  units  output  and  the  residual  ouqrut  error  over  all  output  units.  Once 
trained,  the  "best"  hidden  unit  is  incorporated  into  the  network  with  fixed  input  weights  with  subsequent  weight 
modifications  occurring  on  the  output  units.  The  CC  miqrping  is  described  by  the  network  equations 


output  unit: 


fv  1 

for  J>ni+n  ; 

hidden  unit:  Xj  =  g 

Z’*’**‘ 

V  w  J 

for  j>m 


where  corresponds  to  the  weight  matrix  and  m  and  n  iq>resent  the  size  of  the  input  vector  and  number  of  hidden  units, 
respectively.  For  feedforward  ardiitectures,  the  weight  matrix  is  nearly  upper  triangular.  The  CC  architecture  is  shown  in 
Fig.  2. 
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Fig.  2.  The  cascade-correlation  architecture.  The  boxes  indicate  weights  which 
are  frozen,  only  the  ontput  weights  are  modified. 


The  difference  between  the  CC  and  PPR  function  approximation  technique  previously  discussed  becomes  more 
evident  if  the  output  unit  equation  is  rewritten  with  sq)arate  summations  for  the  input  and  hidden  units  and  e}q)anded 
accordingly 


yj= 

yj  = 

yj  = 


where  the  subscript  ong  has  been  dropped  signifying  the  activation  functions  are  the  same  (this  is  not  always  the  case). 
Instead  of  the  lin^  combination  of  nonlinear  mappings  of  projections  of  the  input  vector  x  as  generated  using  the  PPR 
algorithm,  the  CC  architecture  yields  a  nonlinear  mailing  of  a  weighted  linear  combination  of  mmlinear  mappings  of 
projections  of  the  input  vector  x.  As  the  number  of  hidden  units  n  increases,  more  complex  nonlinear  manifolds  readt  In 
comparison  to  the  PPR  mapping,  this  capability  may  yield  better  pattern  classification  results  on  highly  non-convex  data 
sets.  This  deficiency  of  PI^  with  respect  to  its  ability  to  achieve  highly  nonlinear  mappings  is  also  pointed  out  Iqr  Huber'^ 
who  states  *that  PP  is  poorly  suited  to  deal  with  highly  nonlinear  structures." 


3.  EVOLUTIONARY  PROGRAMMING 

Evolutionaiy  prograinining  (EP)  provides  a  powerful  framework  for  simultaneously  evaluating  neural  network 
models  and  their  paiameterizations.  Like  PPR  m^hods,  the  EP  search  strategy  can  be  computationally  demanding  EP  is  a 
qrstematic,  multi-agent,  stochastic  search  technique  proposed  I9  Fogel  et  al.  EP  has  Imn  used  to  generate  finite  state 
machines*^,  auto-rpgressive  moving  average  (ARMA)  modets*^  probability  density  mixture  models'^,  and  recurrent 
perceptrons*^. 

The  EP  paradigm  can  be  described  tty  the  following  algorithm‘s 

1.  Form  an  initial  population  P  -  [xqX^  ...  x^_jJ  of size  2N  by  randomly  initializing  each 

n-dimensional  solution  vector  x,.  A  user  specified  search  domain  x,  _ 1"  maybe 

imposed 

2.  Assign  a  cost  to  each  element  xi  in  the  population  based  on  the  associated  objective 
Junction  Jj  s.L  d>:R"  -*  R 

3.  Reorder  the  population  in  descending  order  based  on  the  number  of  wins  generated from  a 

stochastic  competition  process.  Wins  are  generated  by  rando..dy  selecting  other  members 
in  the  population  Xj  and  incrementing  the  win  comterWf  if  • 

4.  Generate  offspring  (Xjf....X2}f,i)fivm  the  N  hipest  ranked  elements  (xp ....  Xj^_j)  in  the 

population  by  modifying  each  element  ex/  with  a  random  perturbation 

&„  -  N(0.  Sf  u  Ji  +  fiJ  such  that 

5.  Loop  to  step  2. 


The  Bohachevsky  function  /(x)  =  x,'‘  +  2X} -0.3cos(3iDr|)-0.4cos(4in^)-f0.7  is  proposed  as  a  response  surfrKc  to 
demonstrate  the  benefits  of  EP  as  a  global  optimization  stiat^.  The  transcendental  terms  generate  many  local  minima 

within  the  interval  x  €[-l,lf  while  the  quadratic  terms  dominate  the  sur&ce  strticture  outside  of  this  interval.  A  unique 
global  minimum  exists  at  x^(0,0).  A  trajectory  of  the  best  population  member  at  each  generation  duriiig  a  search  on  the 
Bohachevsky  surface  is  superimposed  on  the  Bohachevsky  contours  as  shown  in  Fig.  4.  Since  the  search  is  stochastic,  it  is 
expected  that  this  trajectoiy  will  vary  for  eveiy  trial. 


Fig.  4.  Trajectory  of  the  best  point  in  the  population  during  the  evolutionary  search 
process  on  the  Bohachevsky  surface. 


2.3  The  neural  network  classifier  model 


A  feedforward  neural  network  may  be  considered  as  a  functional  mapping  f  :X  Y  where  X  eR",  Y  eR" 
subject  to  a  topology  T(N,C)  as  defined  by  the  neuron  inter-oormectivity  C  over  the  number  of  available  neurons  N.  Similar 
to  the  PPR  algorithm,  the  mapping  network  may  even  contain  variable  types  of  activation  functions  g  so  that  g^G  where  G 
is  the  set  of  possible  activation  functions.  Motivated  by  the  PPR  and  CC  structures,  as  well  as  the  better  generalization 
capabilities  cd  parsiinonious  structures,  a  general  structure  shown  such  as  that  shown  in  Fig.  3(a)  is  proposed  as  a  pattern 
classifier.  It  is  the  intention  of  this  work  to  generate  such  ardiitectures  using  the  evolutioruuy  progtxunming  search 
strategy. 


Differences  between  the  prcyxised  architecture  and  the  CC  structure  include  the  ability  to  modify  all  connection 
weights  throughout  the  learning  process.  This  is  different  than  minimizing  the  residuals  with  each  additional  hidden  units 
as  accomplished  in  the  CC  learning  architecture.  To  achieve  less  than  full  connectivity,  a  connectivity  arrsty  C  is  q>ecified 
as  shown  in  Fig.  3(b)  vrtiete  the  tow  iiulex  ooneq;>onds  to  ‘^m*  units  and  the  each  oolunm  index  corresponds  to  "to*  imits 

(;.«.,  ClfromJltoJ).  If  each  connectivity-weight  product  is  combined  as  =  c^-w^.then  the  nonliitear  mapping  is  the 
similar  to  the  (X  mapping  (assuming  C/y=/)  and  can  be  described  as 


^  « 

w«-i  Y 

>/=«> 

k  <=l  y 

.  <•=!  Jj 

where  variable  activation  functions  have  been  itKorporated.  While  activation  functions  could  easily  be  incorporated  as  an 
additiotud  evolutionary  search  parameter,  the  results  presented  in  this  work  maintained  the  same  activation  fimctions  on  the 
output  and  hidden  units  since  this  is  a  preliminary  investigation.  Training  must  not  onfy  determine  the  network  weights, 
but  also  the  neiuon  inter-connectivity.  This  problem  requires  an  approach  which  address  NP-haid  optimization  problems. 
The  technique  employed  in  this  investigation  is  the  evolutionary  programming  method.  It  nuty  be  argued  thm  desired 
portions  of  the  network  are  needlessly  altered  tty  modifying  all  of  the  fiee  parameters  during  the  search  process.  As  will  be 
seen  in  Section  4,  the  evolutionary  search  is  conducted  in  a  manner  that  does  not  simultaneously  effect  all  of  the  network 
parameters.  It  is  expected  that  surviving  members  of  the  population  will  retain  beneficial  structure  and  |»rameters. 
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Fig.  3.  (a)  A  general  feedforward  network  architecture  and  (b)  its  associated  connectivity  matrix. 


It  is  interesting  to  note  that  evolutionary  search  strategies  are  generally  robust  with  respect  to  a  broad  class  of 
problems.  Nontrivial  constraints  can  be  incorporated  into  the  objective  function  in  an  effort  to  take  advantage  of  a  priori 
knowledge  about  the  problem  domain.  It  should  also  be  noted  that  evolutionary  optimization  strategies  tend  to  be  slower 
than  mote  deterministic  optimization  approaches.  However,  their  *time  complexity  quite  often  grows  in  a  linear  maimer 
together  with  the  problem  size" 


4.  EVOLVING  ANN  PATTERN  CLASSIFIERS 

As  previously  stated,  EP  provides  a  powerful  relaxation  search.  To  improve  search  efficiency,  algorithm  1  of 
Solis&Wets^'  has  been  end)edded  in  the  search  process  in  parallel  with  the  ofiEspring  method  normally  incorporated  in  EP. 
A  connection  modification  mecdianism  is  implemented  by  changing  the  state  of  a  randomly  selected  synapse.  For  example, 
if  CfiJOJ  ~  1  then  the  bit  is  flipped  to  0.  Likewise,  if  C[i][j]  -  0  then  the  bit  is  flipped  to  1.  The  structure  mcxUfication 
procure  must  be  empicyed  with  caution.  Frequent  structural  modifications,  say  every  generation,  will  cause  the  network 
to  evolve  with  a  high  and  no  connections.  This  happens  when  learning  is  dow  to  occur  with  leqrect  to  the  frequency 
of  structural  modifications.  As  a  result,  the  ooiuiection  matrix  is  modified  only  every  K  generations  {K’=5-10  for  this  work). 
These  cdumges  are  manifested  in  the  evolutionary  search  strategy  by  replacing  step  4  in  the  EP  algorithm  above  with 

4.  Generate  offspring 

i.  Modify  die  N  highest  ranked  elements  (xq  ....  using  the  Solis  &.  Wets  algorithm  I 
a.  Perturb  the  weight  set  using  the  EP  scheme  £saissed  above  and  modify  the  connectivity 
structure  of  (x^j ....  Xj/^^j)  by  flipping  a  randomly  selected  bit  in  the  connectivity  matrix 
every  K  generations 

By  replacing  the  parent  networks  with  offspring  generated  using  the  Solis  &  Wets  algorithm,  it  is  guaranteed  that  the 
objective  fimction  will  be  decreasing,  that  is  Af,  s  ^0.  New  structures  are  generated  in  the  c^^ring  with 

"good"  structures  beirig  propagated  at  the  parent  level.  A  rdativeiy  small  population  size  with  single  oCEqrring)  is 
used  in  this  investigatiorL  A  single  generation  of  the  evolutionary  search  proc^ure  is  shown  in  Fig.  5. 

The  cxrst  function  empicyed  in  this  study  is  a  heuristic  based  on  Akaike's*’  information  criterion  (AIQ.  This 
information  criterion  address  nradel  order  by  irKorporating  the  number  of  flee  parameters  to  be  determined  for  a  particular 
model  selection  along  with  the  maximum  likelihood  estimate.  Other  information  criterion  sudi  as  the  nunimum  description 
length  (MDL)  could  also  have  been  used  for  the  objective  function. 

The  AIC  for  an  autoregressive  moving-average  (ARMA)  model  with  AR  order  p  and  MA  order  q  as  described  by 

^C(p.9)  =  Flog(oJ)+2(p+g) 

has  been  modified  for  use  as  the  objective  function  so  that 

AIC(N^)  =  P\og(5i)+2N, 

udiere  P  is  the  effective  number  of  observations  or  patterns  and  is  the  number  of  connections.  Lower  AIC  values  indicate 
better  models.  Thus,  the  goal  of  the  evolutionary  search  process  is  to  minimize  A1C(N^)  by  searching  over  the  weight  and 
coiuiectivity  spaces.  The  maximum  likelihood  estimate  of  the  variance  is  determined  in  the  usual  way^** 


r  * 

where  is  the  desired  target  and  o  j  is  actual  output  of  neuron  i  for  input  pattern  p.  For  the  present  study,  it  is  suspected 
that  this  infornuttion  criterion  insufficiently  addresses  the  number  of  free  parameters  utilized  in  the  network.  Nevertheless, 
it  is  employed  without  regard  for  the  number  of  hidden  units  implemented  in  the  network. 


New  PopMUlioa 


Fie.&  One  generation  of  the  hybrid  multi-agent  stochastic  search.  This  method 
is  a  combination  of  both  EP  and  the  Solis&Wets  technique. 


S.  RESULTS 

5.1  The  parity  problem 

The  parity  problem  serves  as  a  popular  benchmark  rince  the  mapping  is  not  linearly  sqKuable.  Tests  were 
coitducted  for  the  three-bit  parity  problem  which  consists  of  eight  exemplars.  Sigmoidal  activation  functions  were 
incorporated  in  the  network.  Ten  parent  networks  were  emplr^ed,  eadt  having  a  single  ofEq>ring.  The  networks  were 
initialized  to  be  fully  connected  at  the  start  of  the  run  to  five  hidden  units.  An  example  evolutionary  optimization  run  is 
shown  in  Hg.  6  which  shows  the  number  of  cormections  and  MSE  of  the  network  with  the  least  cost  at  each  generation. 
Even  after  the  MSE  reaches  an  acceptable  level  the  optimization  procedure  continues  to  reduce  the  number  of  connections. 
It  is  interesting  to  observe  which  part  of  the  search  yields  the  best  network  at  each  generation.  For  the  run  shown  in  Fig.  6, 
Fig.  7  gives  the  index  of  the  network  with  the  lowest  cost  Indices  below  10  indicate  the  Solis  A  Wets  algorithm  generates 
better  nets  and  indices  equal  to  or  greater  than  10  illustrate  that  the  EP  weight  perturbation/coimection  modification 
strategy  generates  lower  cost  nets.  It  should  be  noted  that  the  network  generated  in  this  run  is  essentially  a  two-ltyer 
network  with  shortcut  cormections  firom  the  input  units  to  the  hidden  units  as  shown  in  Fig.  8.  Recall  that  a  traditional 
fiilly-cormected  architecture  with  a  single  hidden  layer  would  result  in  16  connections  whereas  the  network  shown  in  Fig.  8 
has  13. 


Fig.  6.  The  evolutionary  optimization  procedure  applied  Fig.  7.  The  NN  index  indicates  the  network  rank  in 

to  the  3bit  parity  problem.  The  network  with  the  lowest  the  population.  The  rank  of  the  network  with  the 
cost  is  shown  at  each  generation.  lowest  cost  is  shown  at  each  generation. 


Fig.  8.  Resulting  network  configuration  for  3bit  parity  problem. 


S.2  A  two-class  Gaussian  problem 

Let  the  function  Jfx,y)  be  jointly  normal  as  denoted  I9  a^.  A  small  sample  of  SO  observations  is  taken 

from  class  1  as  defined  N(-1.5,0,I,1)  and  class  2  as  defii^  by  N(I.5,0,1,I),  respectively.  Using  sigmoidal  activation 
functions,  a  network  was  evolved  to  distinguish  between  the  two  classes  given  a  single  i^y)  observation.  Again,  this 
problem  is  of  academic  interest  due  to  its  nonlinear  separability  lequiiemenL  A  modification  was  made  in  the  evolutionary 
learning  procedure  for  this  mapping.  Since  the  Solis  &  Wets  metirnd  is  a  powerful  random  optimization  technique  in  its 
own  right,  it  was  used  to  generate  offispring  from  (Solis  &.  Wets)  modified  parent  networics  in  lieu  of  generating  of^ring  in 
the  traditional  EP  frishioa  The  EP  fiamework  (i.e.  competitive  annealing)  was  still  used  to  retain  good  network  structures. 

The  cost  and  MSB  of  the  best  network  at  each  generation  is  shown  in  Fig.  9.  Fig.  10  shows  the  index  of  the  best 
network  at  every  generation.  Using  the  scheme  described  idxrve.  the  oCEqiring  networks  will  always  be  equivalent  to  or 
better  than  the  jiarent  n^works  unless  the  connectivity  structure  is  modified  The  network  was  initially  fully  connected  to 
10  hidden  units  (88  cormections).  After  5000  generations  only  38  connections  remained  with  a  MSE^.0003.  The 
resulting  configuration  is  shown  in  Fig  1 1.  Fig.  12  shows  the  limited  number  of  samples  from  each  class  superimposed  on 
the  contour  plot  Duda  and  Hart^‘  give  more  determinstic  methods  for  formulating  discriminants  if  normal  densities  are 
assumed. 


Fig.  9.  Evolutionary  optimization  for  a  two-class  Fig  10.  The  network  with  the  lowest  cost  at  each 

Gaussian  data  set  generation.  Networks  1-10  and  11-20  correspond 


to  the  parent  and  offspring  networks,  respectively. 


Fi^  11.  The  resulting  network  for  the  two<lass  Fig.  12.  The  two  classes  of  data  superimposed  on  a 

Gaussian  problem  after  SOOO  generations.  The  contour  plot  of  the  decision  surface.  The class  has 

x's  Indicate  connections  whereas  the  o's  indicate  an  output  value  of  1  and  the  V  class  has  an  output 

links  which  are  not  connected.  Note  units  9  and  value  of  0. 

12  have  not  been  incorporated. 

5.3  The  T-C  classifier 

The  final  set  of  computer  experiments  investigates  the  dassification  of  binary  T-C  images  which  have  different 
scaling  and  rotation  as  shown  in  Fig.  13.  The  eight  2S  bit  T*  patterns  were  designated  as  a  sqiarate  class  from  the  eight  25 
bit  'C  patterns.  Starting  with  a  pc^xilation  of  fully-connected  networks  with  10  hidden  units,  it  generally  took  less  than  SOO 
generations  to  evolve  a  network  which  distinguishes  between  the  given  "T”  and  "C*  patterns.  Fig.  14  shows  the 
evolutionary  optimization  process  and  Fig.  IS  shows  the  resulting  network.  Since  the  center  pixel  is  always  1,  it  is 
interesting  to  nbte  that  the  input  from  this  pixel  was  not  discoruiected  as  it  does  not  provide  any  discriminatory  information 
between  the  two  patterns.  It  is  qreculated  that  this  probably  wmild  occur  with  an  increased  number  of  learning  generations. 
For  the  network  shown  in  Fig.  14,  the  number  of  cormections  was  reduced  roughly  11%  (from  341  to  302).  However  the 
MSE  appears  acceptable  within  the  short  number  of  generations.  Most  of  the  trials  generated  an  1 1-13%  reduction  in  SOO 
generations. 


Fig.  13.  Binary  T-C  patterns  rotated  and  scaled  on  a  5x5  grid. 
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Fig.  14.  Evolutionary  optimization  for  the  T-C  patterns 
shown  in  Fig.  13.  Training  was  arbitrarily  stopped  at 
500  generations.  It  appears  that  the  cost  function  is 
still  being  minimized  by  disconnecting  neurons. 


Fig.  IS.  The  ranking  of  the  best  network  in  the 
population  at  each  generation.  As  in  the  previous 
section  a  dual  Solis  &  Wets  approach  was  used  to  both 
replace  the  parents  and  generate  offspring. 
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Fig.  16.  The  evolved  network  for  solving  the  T-C  pattern  problem  given  in  Fig.  13.  It  is  very  likely  that  additional 
link  would  have  been  disconnected  given  more  learning  generations. 


6.  CONCLUSION 


The  network  designer  imposes  certain  constraints  in  selecting  a  network  architecture  before  training.  These 
constraints  are  manifested  in  the  topology  which  describes  the  inter-neuron  coimections,  the  number  of  neurons  and  the 
activation  function.  Otxx  the  network  architecture  is  arbitrarily  chosen,  a  weight  set  is  found  which  optimizes  the  model 
for  the  desired  mapping.  Constructive  techniques  such  as  projection  pursuit  regression  and  cascade-correlation 
architectures  serve  as  a  means  to  relax  the  constraints  imposed  tte  designer.  The  global  optimization  capabilities  of 
evolutionary  search  methods  can  be  used  to  generate  subsets  of  cascade-correlation  style  architectures  by  simultaneously 
searching  over  weight  and  neuron  connectivity  spaces.  Since  these  techniques  are  stochastic  and  global  in  nature,  once 
good  solutions  are  found  they  may  be  optimized  using  local  methods. 

The  evolutionary  optimization  approach  outlined  in  this  paper  is  extremely  versatile.  Although  static  activation 
functions  were  emplttyed  for  each  neuron,  changing  the  activation  function  did  not  require  modification  of  the  optimization 
code.  Spurio'is  coruiections  normally  generated  by  evolutionary  construction  of  networks  were  not  observed.  Reasonable 
results  were  consistently  found  for  the  types  of  patterns  classified  in  this  work  even  though  a  small  number  of  parent 
networks  where  used.  Additional  work  will  ascertain  the  better  generalization  capabilities,  if  any,  achieved  using 


parsimonious  structures  generated  by  this  approach.  Further  work  will  also  apply  this  technique  to  more  difficult  mappings 
such  as  the  two-qiiral  classification  problem^  as  well  as  classification  problems  of  Naval  interest 
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